9 research outputs found

    Estimating conditional density of missing values using deep Gaussian mixture model

    Full text link
    We consider the problem of estimating the conditional probability distribution of missing values given the observed ones. We propose an approach, which combines the flexibility of deep neural networks with the simplicity of Gaussian mixture models (GMMs). Given an incomplete data point, our neural network returns the parameters of Gaussian distribution (in the form of Factor Analyzers model) representing the corresponding conditional density. We experimentally verify that our model provides better log-likelihood than conditional GMM trained in a typical way. Moreover, imputation obtained by replacing missing values using the mean vector of our model looks visually plausible.Comment: A preliminary version of this paper appeared as an extended abstract at the ICML 2020 Workshop on The Art of Learning with Missing Value

    The general framework for few-shot learning by kernel HyperNetworks

    Get PDF
    Few-shot models aim at making predictions using a minimal number of labeled examples from a given task. The main challenge in this area is the one-shot setting, where only one element represents each class. We propose the general framework for few-shot learning via kernel HyperNetworks—the fusion of kernels and hypernetwork paradigm. Firstly, we introduce the classical realization of this framework, dubbed HyperShot. Compared to reference approaches that apply a gradient-based adjustment of the parameters, our models aim to switch the classification module parameters depending on the task’s embedding. In practice, we utilize a hypernetwork, which takes the aggregated information from support data and returns the classifier’s parameters handcrafted for the considered problem. Moreover, we introduce the kernel-based representation of the support examples delivered to hypernetwork to create the parameters of the classification module. Consequently, we rely on relations between the support examples’ embeddings instead of the backbone models’ direct feature values. Thanks to this approach, our model can adapt to highly different tasks. While such a method obtains very good results, it is limited by typical problems such as poorly quantified uncertainty due to limited data size. We further show that incorporating Bayesian neural networks into our general framework, an approach we call BayesHyperShot, solves this issue

    Hypernetwork approach to Bayesian MAML

    Full text link
    The main goal of Few-Shot learning algorithms is to enable learning from small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the shared universal weights of a meta-model, which are then adapted for specific tasks. However, the method suffers from over-fitting and poorly quantifies uncertainty due to limited data size. Bayesian approaches could, in principle, alleviate these shortcomings by learning weight distributions in place of point-wise weights. Unfortunately, previous modifications of MAML are limited due to the simplicity of Gaussian posteriors, MAML-like gradient-based weight updates, or by the same structure enforced for universal and adapted weights. In this paper, we propose a novel framework for Bayesian MAML called BayesianHMAML, which employs Hypernetworks for weight updates. It learns the universal weights point-wise, but a probabilistic structure is added when adapted for specific tasks. In such a framework, we can use simple Gaussian distributions or more complicated posteriors induced by Continuous Normalizing Flows.Comment: arXiv admin note: text overlap with arXiv:2205.1574

    Augmentation-aware Self-supervised Learning with Guided Projector

    Full text link
    Self-supervised learning (SSL) is a powerful technique for learning robust representations from unlabeled data. By learning to remain invariant to applied data augmentations, methods such as SimCLR and MoCo are able to reach quality on par with supervised approaches. However, this invariance may be harmful to solving some downstream tasks which depend on traits affected by augmentations used during pretraining, such as color. In this paper, we propose to foster sensitivity to such characteristics in the representation space by modifying the projector network, a common component of self-supervised architectures. Specifically, we supplement the projector with information about augmentations applied to images. In order for the projector to take advantage of this auxiliary guidance when solving the SSL task, the feature extractor learns to preserve the augmentation information in its representations. Our approach, coined Conditional Augmentation-aware Selfsupervised Learning (CASSLE), is directly applicable to typical joint-embedding SSL methods regardless of their objective functions. Moreover, it does not require major changes in the network architecture or prior knowledge of downstream tasks. In addition to an analysis of sensitivity towards different data augmentations, we conduct a series of experiments, which show that CASSLE improves over various SSL methods, reaching state-of-the-art performance in multiple downstream tasks.Comment: Prepint under review. Code: https://github.com/gmum/CASSL

    Processing of incomplete data with convolutional neural networks

    No full text
    Trenowanie modeli uczenia maszynowego na danych zawierających luki to jeden z wiodących problemów w tej dziedzinie nauki. Problem niepełnych danych pojawia się w wielu przypadkach praktycznego zastosowania takich modeli, takich jak robotyka czy przetwarzanie obrazów medycznych. Przystosowanie głębokich sieci neuronowych, takich jak sieci konwolucyjne, do przetwarzania takich danych pozostaje otwartym problemem, gdyż techniki polegające na naiwnym, pojedynczym wypełnieniu często mogą działać niedokładnie, zarazem nie przekazując informacji o niepewności podanego wypełnienia. W niniejszej pracy prezentujemy MisConv - metodę przetwarzania obrazów z brakującymi fragmentami przez głębokie konwolucyjne sieci neuronowe. Zaproponowana metoda jest ogólnieniem konwolucyjnych warstw sieci neuronowej zdolnym do przetwarzania parametrów gęstości prawdopodobieństwa brakujących fragmentów danych i obliczającym wartość oczekiwaną aktywacji sieci. Jednocześnie dla znanych fragmentów obrazów, warstwa ta działa identycznie jak zwykła warstwa konwolucyjna. Do działania MisConv konieczne jest przewidywanie rozkładów brakujących fragmentów obrazów. W tym celu zostaje użyty model Deep Mixture of Factor Analyzers (DMFA), który do estymacji parametrów tych rozkładów wykorzystuje głęboką sieć neuronową. Porównujemy to podejście z innymi popularnymi modelami służącymi do tego zadania i pokazujemy, że model ten może być również z sukcesem trenowany na niepełnych danych. Sprawdzamy zaproponowaną metodę, trenując na niepełnych danych modele służące do różnych zadań przetwarzania obrazów – klasyfikacji, rekonstrukcji oraz generatywne. Jakość tych modeli zostaje porównana z innymi sposobami trenowania na niepełnych danych – używając alternatywnych do zaproponowanego sposobów imputacji i różnych sposobów przetwarzania rozkładu brakujących fragmentów przez model wykonujący docelowe zadanie. Przeprowadzone eksperymenty pokazują, że sieci konwolucyjne wyposażone w warstwę MisConv osiągają wyniki lepsze lub porównywalne z innymi metodami. Fragment niniejszych prac dotyczący modelu DMFA został opublikowany w artykule "Estimating Conditional Density of Missing Values Using Deep Gaussian Mixture Model" na workshopie Artemiss: The Art of Learning with Missing Values podczas międzynarodowej konferencji Intenational Conference on Machine Learning (ICML) w lipcu 2020 r. oraz na międzynarodowej konferencji International Conference on Neural Information Processing (ICONIP) w listopadzie 2020 r. Artykuł opisujący warstwę MisConv pt. "MisConv: Convolutional Neural Networks for Missing Data" został zgłoszony na międzynarodową konferencję Neural Infomation Processing Systems (NeurIPS) 2021 i znajduje się obecnie w procesie recenzji.Training of Machine Learning models on incomplete data is one of the most important problems in this research domain. The issue of missing data arises in many practical applications of such models, such as robotics or processing of medical images. Adapting deep neural networks, such as convolutional neural networks, to the case of missing data remains an open problem, because imputation-based techniques may often yield inaccurate results and are not able to estimate the uncertainty of their own predictions. In this work, we present MisConv — a method for processing of images with missing data by deep convolutional neural networks. The proposed method is a generalization of the classical convolutional layer, able to process the probability density of the missing data and compute the expected value of network activation. Nevertheless, for the known parts of images, this layer acts like a classical convolution.An essential component of MisConv is the estimation of distributions of the missing parts of images. For this purpose, we utilize a model called Deep Mixture of Factor Analyzers (DMFA), which utilizes a neural network to perform this task. We compare this approach with other popular models used for missing data imputation and show that DMFA can be successfully trained on missing data. We evaluate the proposed method by training on missing data models used for various image processing tasks — classification, reconstruction and generation. Those target models are compared with models trained with different means of handling the missing data. The conducted experiments indicate, that convolutional neural networks equipped with MisConv layer obtain better or similiar results, compared to other methods of processing missing data.A fragment of this work describing the DMFA model has been published in a paper "Estimating Conditional Density of Missing Values Using Deep Gaussian Mixture Model" in the " Artemiss: The Art of Learning with Missing Values" workshop during the Intenational Conference on Machine Learning (ICML) 2020 Conference in July 2020, as well as a paper of the same name during the International Conference on Neural Information Processing (ICONIP) in November 2020. The paper describing the MisConv Layer has been submitted to the Neural Infomation Processing Systems (NeurIPS) 2021 Conference and is currently under review

    MisConv: convolutional neural networks for missing data

    No full text

    MisConv: convolutional neural networks for missing data

    No full text
    Processing of missing data by modern neural networks, such as CNNs, remains a fundamental, yet unsolved challenge, which naturally arises in many practical applications, like image inpainting or autonomous vehicles and robots. While imputation-based techniques are still one of the most popular solutions, they frequently introduce unreliable information to the data and do not take into account the uncertainty of estimation, which may be destructive for a machine learning model. In this paper, we present MisConv, a general mechanism, for adapting various CNN architectures to process incomplete images. By modeling the distribution of missing values by the Mixture of Factor Analyzers, we cover the spectrum of possible replacements and find an analytical formula for the expected value of convolution operator applied to the incomplete image. The whole framework is realized by matrix operations, which makes MisConv extremely efficient in practice. Experiments performed on various image processing tasks demonstrate that MisConv achieves superior or comparable performance to the state-of-the-art methods.Comment: Accepted for publication at WACV 2022 Conferenc

    HyperShot : few-shot learning by kernel hypernetworks

    No full text
    Few-shot models aim at making predictions using a minimal number of labeled examples from a given task. The main challenge in this area is the one-shot setting where only one element represents each class. We propose HyperShot - the fusion of kernels and hypernetwork paradigm. Compared to reference approaches that apply a gradient-based adjustment of the parameters, our model aims to switch the classification module parameters depending on the task's embedding. In practice, we utilize a hypernetwork, which takes the aggregated information from support data and returns the classifier's parameters handcrafted for the considered problem. Moreover, we introduce the kernel-based representation of the support examples delivered to hypernetwork to create the parameters of the classification module. Consequently, we rely on relations between embeddings of the support examples instead of direct feature values provided by the backbone models. Thanks to this approach, our model can adapt to highly different tasks

    Zero time waste in pre-trained early exit neural networks

    Get PDF
    The problem of reducing processing time of large deep learning models is a fundamental challenge in many real-world applications. Early exit methods strive towards this goal by attaching additional Internal Classifiers (s) to intermediate layers of a neural network. s can quickly return predictions for easy examples and, as a result, reduce the average inference time of the whole model. However, if a particular does not decide to return an answer early, its predictions are discarded, with its computations effectively being wasted. To solve this issue, we introduce Zero Time Waste (ZTW), a novel approach in which each reuses predictions returned by its predecessors by (1) adding direct connections between s and (2) combining previous outputs in an ensemble-like manner. We conduct extensive experiments across various multiple modes, datasets, and architectures to demonstrate that ZTW achieves a significantly better accuracy vs. inference time trade-off than other early exit methods. On the ImageNet dataset, it obtains superior results over the best baseline method in 11 out of 16 cases, reaching up to 5 percentage points of improvement on low computational budgets
    corecore